{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.2 图像卷积\n",
    "上节课我们简单介绍了卷积和卷积层的基本知识。考虑到图像卷积是理解卷积神经网络的关键，本节咱们更详细的阐述相关细节内容，尤其是实际应用和代码实现，帮助你加深理解。先来看看卷积和互运算两个概念。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.2.1 卷积运算 vs 互相关运算\n",
    "\n",
    "在图像卷积中，这两个数学概念经常容易混淆。互相关运算（cross-correlation）的数学定义如下：\n",
    "\n",
    "$$(I * K)(x,y)=\\sum_{s=-\\infty}^{\\infty}\\sum_{t=-\\infty}^{\\infty}I(x+s,y+t) \\cdot K(s,t)$$\n",
    "\n",
    "其中，$I$ 是输入图像，$K$ 是卷积核，$*$ 是互相关运算的符号，$(x,y)$ 是输出图像中的像素坐标。你可能发现了，这个公式怎么特别像上一小节图像卷积的公式呢？是的，这也是让很多同学混乱的地方，它们的确就是一回事。稍等咱们就解释清楚。再来看看真正的卷积运算（convolution）数学公式：\n",
    "\n",
    "$$(I * K)(x,y)=\\sum_{s=-\\infty}^{\\infty}\\sum_{t=-\\infty}^{\\infty}I(s,t) \\cdot K(x-s,y-t)$$\n",
    "\n",
    "其中，$I$ 是输入图像，$K$ 是卷积核，$*$ 是卷积运算的符号，$(x,y)$ 是输出图像中的像素坐标。\n",
    "\n",
    "二者是不是长的很像，只差了一个正负号，卷积和互相关相比加号变成了减号。严格说在数学上它们是不同的运算，但是在图像处理领域，两者差别就没那么明显了，或者说反而产生了混用现象。我们用一个例子来加以说明。先看互相关运算，下图左边为原始图像，它和一个卷积核做互相关运算后得到右边的输出图像。\n",
    "<img src=\"../images/7-2-1.png\" width=\"70%\"></img>\n",
    "卷积核在左边图像上滑动，同时做互相关运算，也就是对应元素相乘在相加。由于原图像多数元素都为0，只在输入图像中心为1，因此最后输出了右边图像。结果显示相当于把卷积核进行了180度翻转。\n",
    "\n",
    "再来看卷积运算，公式中它和互相关相比改变了符号，体现在图像上就是先把卷积核进行了180度翻转，或者说就是先左右翻转，然后上下翻转，然后再进行互相关运算。如下图所示：\n",
    "<img src=\"../images/7-2-2.png\" width=\"70%\"></img>\n",
    "\n",
    "由于卷积运算相当于翻转了两次，因此最后输出图像中还是123456789这样的顺序，而互相关运算只翻转了一次，最终结果变成了987654321这样的形式。\n",
    "\n",
    "虽然两种运算看似不同，但由于卷积核是自己定的，本身没有顺序上的限制，所以完全可以用翻转过的卷积核，这样以来二者就没啥区别了。也正是因为这个原因，大部分深度学习框架在代码实现的时候偷了懒，干脆用了互相关运算替代了卷积。因此严格意义上说，卷积层用的是互相关运算，但依然被叫成了卷积。这就像是双胞胎，明明是老大干活，却总被叫成老二一样，不过对外人而言区别不大。本书后续课程，将遵循惯例，不再对二者做明确区分。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[[[0., 0., 0., 0., 0., 0., 0.],\n",
      "          [0., 0., 0., 0., 0., 0., 0.],\n",
      "          [0., 0., 9., 8., 7., 0., 0.],\n",
      "          [0., 0., 6., 5., 4., 0., 0.],\n",
      "          [0., 0., 3., 2., 1., 0., 0.],\n",
      "          [0., 0., 0., 0., 0., 0., 0.],\n",
      "          [0., 0., 0., 0., 0., 0., 0.]]]])\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "import torch.nn.functional as F\n",
    "\n",
    "# 定义输入图像和卷积核\n",
    "image = torch.zeros(7, 7, dtype=torch.float32)\n",
    "\n",
    "# 将中心位置设置为 1\n",
    "image[3][3] = 1.0\n",
    "\n",
    "kernel = torch.tensor([[1, 2, 3],\n",
    "                       [4, 5, 6],\n",
    "                       [7, 8, 9]],dtype=torch.float32)\n",
    "\n",
    "# 计算卷积\n",
    "result = F.conv2d(image.unsqueeze(0).unsqueeze(0), kernel.unsqueeze(0).unsqueeze(0), padding = 1)\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "结果和前面图示中完全一致，注意在计算卷积时我们填充了一圈0，以便保证输出图像大小与输入一致。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.2.2 卷积层\n",
    "\n",
    "卷积层对输入和卷积核权重进行互相关运算，同时加入偏置。因此，卷积层训练的目标是找到卷积核权重和偏置这些参数。我们下面来看下如何用pytorch实现一个二维图像卷积运算的训练过程。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([1, 1, 28, 28])\n",
      "OrderedDict([('weight', tensor([[[[-0.0327, -0.1699, -0.0087],\n",
      "          [-0.0618,  0.1318,  0.1570],\n",
      "          [ 0.1586, -0.0503, -0.2356]]]])), ('bias', tensor([-0.0731]))])\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "\n",
    "# 定义数据\n",
    "input_tensor = torch.randn(1, 1, 28, 28)\n",
    "target_tensor = torch.randn(1, 1, 28, 28)\n",
    "\n",
    "# 定义卷积层\n",
    "conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride=1, padding=1)\n",
    "\n",
    "# 计算卷积\n",
    "output_tensor = conv(input_tensor)\n",
    "print(output_tensor.shape)  # 输出: torch.Size([1, 1, 28, 28])\n",
    "\n",
    "# 定义损失函数和优化器\n",
    "loss_fn = nn.MSELoss()\n",
    "optimizer = torch.optim.Adam(conv.parameters(), lr=0.001)\n",
    "\n",
    "# 定义训练循环\n",
    "for epoch in range(100):\n",
    "    # 计算损失\n",
    "    output_tensor = conv(input_tensor)\n",
    "    loss = loss_fn(output_tensor, target_tensor)\n",
    "\n",
    "    # 清空梯度\n",
    "    optimizer.zero_grad()\n",
    "\n",
    "    # 计算梯度并更新权重\n",
    "    loss.backward()\n",
    "    optimizer.step()\n",
    "    \n",
    "print(conv.state_dict())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在这个例子中，我们定义了输入输出张量，并使用 nn.Conv2d 类定义了一个卷积层。然后我们使用这个卷积层来计算卷积，输出的张量的形状为 (1, 1, 28, 28)。接下来，我们定义了一个损失函数和优化器，并在循环中不断地计算损失、清空梯度、计算梯度和更新权重。输出结果是一个字典，其中包含了卷积层的权重和偏置项。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.2.3 卷积层特征\n",
    "\n",
    "卷积层的特征映射（feature map）是指卷积层对输入数据进行卷积后得到的输出张量。特征映射中的每个元素都是输入张量中相应区域的卷积和，其中区域的大小和形状由卷积层的内核大小决定。\n",
    "\n",
    "例如，假设输入张量的大小为 $H \\times W$，内核大小为 $k \\times k$，那么卷积层的输出张量大小就为 $\\frac{H - k + 2p}{s} + 1 \\times \\frac{W - k + 2p}{s} + 1$，其中 $p$ 是 padding 的大小，$s$ 是 stride 的大小。\n",
    "\n",
    "卷积层的感受野（receptive field）是指输入张量中卷积层特征映射中的每个元素所能“感受”到的区域。例如，假设输入张量的大小为 $H \\times W$，内核大小为 $k \\times k$，那么卷积层的感受野大小就为 $k \\times k$。\n",
    "\n",
    "举个例子，假设输入张量是一张 $5 \\times 5$ 的图像，卷积层使用 $3 \\times 3$ 的内核，stride 为 $1$，padding 为 $0$，那么卷积层的输出张量大小为 $3 \\times 3$，感受野大小为 $3 \\times 3$。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.2.4 卷积层应用示例\n",
    "\n",
    "如何用卷积层来实现滤波器的效果呢？我们来演示一个具体的示例。创建一个 8x8 的 tensor，得到黑白相间的图像。然后创建卷积核进行卷积运算，从而实现边缘检测效果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPUAAAD4CAYAAAA0L6C7AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8+yak3AAAACXBIWXMAAAsTAAALEwEAmpwYAAAKlElEQVR4nO3d3Ytc9R3H8c+nUWl9qF5kWyQJTQoSsL2Ju1gkIFSxxCrai15EUKgUcmWItCDau/4DYi+KEKJWMFVKVBARraBihda6iSk1iZY02GZTbVaK9eEmpH56sRNYdR/OzJ4zM/vd9wsWd2cnk+9B3555OOd3nEQA6vjKqAcA0C6iBoohaqAYogaKIWqgmPO6eND169dn8+bNXTz0lxw8eHAof48kTU5ODu3vkupuW9Xtkoa7bUm80O3u4iOtqampTE9Pt/64C7EX3K5ODPvjv6rbVnW7pKFv24J/GU+/gWKIGiiGqIFiiBoohqiBYogaKIaogWKIGiiGqIFiGkVte4ftd2wft31v10MBGNyyUdteJ+nXkm6UdKWk22xf2fVgAAbTZE99taTjSU4kOSPpCUm3djsWgEE1iXqDpJPzfp7p3fY5tnfZnrY9PTs729Z8APrU2htlSfYmmUoyNTEx0dbDAuhTk6hPSdo07+eNvdsAjKEmUb8h6QrbW2xfIGmnpGe6HQvAoJZd+STJWdt3SXpB0jpJDyc50vlkAAbSaDmjJM9Jeq7jWQC0gCPKgGKIGiiGqIFiiBoohqiBYogaKIaogWI6uUKHba5kD3SMK3QAawRRA8UQNVAMUQPFEDVQDFEDxRA1UAxRA8UQNVAMUQPFNLlCx8O2T9t+axgDAViZJnvq30ja0fEcAFqybNRJXpX0nyHMAqAFjVYTbcL2Lkm72no8AINpdOql7c2Snk3y3UYPyqmXQOc49RJYI4gaKKbJR1qPS/qjpK22Z2z/tPuxAAyK5YyAVYrX1MAaQdRAMUQNFEPUQDFEDRRD1EAxRA0U09oJHfNNTk5qenq6i4f+EnvBj+o60cVn+kupum1Vt0sa7rYthj01UAxRA8UQNVAMUQPFEDVQDFEDxRA1UAxRA8UQNVAMUQPFNFmjbJPtl20ftX3E9p5hDAZgME2O/T4r6edJDtm+RNJB2y8mOdrxbAAG0OSyO+8lOdT7/mNJxyRt6HowAIPp6zV170od2yS9vsDvdtmetj09Ozvb0ngA+tU4atsXS3pS0t1JPvri75PsTTKVZGpiYqLNGQH0oVHUts/XXND7kzzV7UgAVqLJu9+W9JCkY0nu734kACvRZE+9XdIdkq6zfbj39cOO5wIwoGU/0krymqTRr9ECoBGOKAOKIWqgGKIGiiFqoBiiBoohaqAYogaKIWqgGHdxrSHbw72AEbAGJVnwoDD21EAxRA0UQ9RAMUQNFEPUQDFEDRRD1EAxRA0UQ9RAMU0WHvyq7T/b/kvvsju/HMZgAAaz7GGivdVEL0rySW+p4Nck7UnypyX+DIeJAh1b7DDRJgsPRtInvR/P730RLTCmmi7mv872YUmnJb2YZMnL7rQ8I4A+9HWWlu3LJD0taXeSt5a4H3tyoGOtnKWV5ENJL0va0cJMADrQ5N3vid4eWra/JukGSW93PBeAATW56Pzlkh61vU5z/xP4XZJnux0LwKBY+QRYpVj5BFgjiBoohqiBYogaKIaogWKIGiiGqIFiiBoopskRZX2bnJzU9PRwTtaaO917OLo4UGcpVbet6nZJw922xbCnBoohaqAYogaKIWqgGKIGiiFqoBiiBoohaqAYogaKIWqgmMZR9xb0f9M2iw4CY6yfPfUeSce6GgRAO5pedmejpJsk7et2HAAr1XRP/YCkeyR9ttgd5l9La3Z2to3ZAAygyRU6bpZ0OsnBpe6XZG+SqSRTExMTrQ0IoD9N9tTbJd1i+11JT0i6zvZjnU4FYGDLRp3kviQbk2yWtFPSS0lu73wyAAPhc2qgmL6WM0ryiqRXOpkEQCvYUwPFEDVQDFEDxRA1UAxRA8UQNVAMUQPFuIvLktge7rVOgDUoyYLX+GFPDRRD1EAxRA0UQ9RAMUQNFEPUQDFEDRRD1EAxRA0UQ9RAMY2WM+qtJPqxpP9JOptkqsuhAAyunzXKvp/kg84mAdAKnn4DxTSNOpJ+b/ug7V0L3WH+ZXfaGw9Avxqdeml7Q5JTtr8h6UVJu5O8usT9OfUS6NiKTr1Mcqr3z9OSnpZ0dXujAWhTkwvkXWT7knPfS/qBpLe6HgzAYJq8+/1NSU/bPnf/3yZ5vtOpAAyM5YyAVYrljIA1gqiBYogaKIaogWKIGiiGqIFiiBoopp9TLxubnJzU9PRwzuvoHRQzFF18pr+UqttWdbuk4W7bYthTA8UQNVAMUQPFEDVQDFEDxRA1UAxRA8UQNVAMUQPFEDVQTKOobV9m+4Dtt20fs31N14MBGEzTY79/Jen5JD+2fYGkCzucCcAKLBu17UslXSvpJ5KU5IykM92OBWBQTZ5+b5E0K+kR22/a3tdb//tz5l92Z3Z2tvVBATTTJOrzJF0l6cEk2yR9KuneL94pyd4kU0mmJiYmWh4TQFNNop6RNJPk9d7PBzQXOYAxtGzUSd6XdNL21t5N10s62ulUAAbW9N3v3ZL29975PiHpzu5GArASjaJOcljSVLejAGgDR5QBxRA1UAxRA8UQNVAMUQPFEDVQDFEDxRA1UIy7uNaQ7eFewAhYg5IseOEu9tRAMUQNFEPUQDFEDRRD1EAxRA0UQ9RAMUQNFEPUQDHLRm17q+3D874+sn33EGYDMIC+DhO1vU7SKUnfS/KPJe7HYaJAx9o6TPR6SX9fKmgAo9V0ieBzdkp6fKFf2N4ladeKJwKwIo2ffvfW/P6XpO8k+fcy9+XpN9CxNp5+3yjp0HJBAxitfqK+TYs89QYwPho9/e5duvafkr6d5L8N7s/Tb6Bjiz39ZuUTYJVi5RNgjSBqoBiiBoohaqAYogaKIWqgGKIGiiFqoJh+z9Jq6gNJ/Z6eub735yqqum1s1+h8a7FfdHJE2SBsTyeZGvUcXai6bWzXeOLpN1AMUQPFjFPUe0c9QIeqbhvbNYbG5jU1gHaM054aQAuIGihmLKK2vcP2O7aP27531PO0wfYm2y/bPmr7iO09o56pTbbX2X7T9rOjnqVNti+zfcD227aP2b5m1DP1a+SvqXsXCPibpBskzUh6Q9JtSY6OdLAVsn25pMuTHLJ9iaSDkn602rfrHNs/kzQl6etJbh71PG2x/aikPyTZ11tB98IkH454rL6Mw576aknHk5xIckbSE5JuHfFMK5bkvSSHet9/LOmYpA2jnaodtjdKuknSvlHP0ibbl0q6VtJDkpTkzGoLWhqPqDdIOjnv5xkV+Y//HNubJW2T9PqIR2nLA5LukfTZiOdo2xZJs5Ie6b202NdbdHNVGYeoS7N9saQnJd2d5KNRz7NStm+WdDrJwVHP0oHzJF0l6cEk2yR9KmnVvcczDlGfkrRp3s8be7eterbP11zQ+5M8Nep5WrJd0i2239XcS6XrbD822pFaMyNpJsm5Z1QHNBf5qjIOUb8h6QrbW3pvTOyU9MyIZ1ox29bca7NjSe4f9TxtSXJfko1JNmvu39VLSW4f8VitSPK+pJO2t/Zuul7Sqntjs6tTLxtLctb2XZJekLRO0sNJjox4rDZsl3SHpL/aPty77RdJnhvdSGhgt6T9vR3MCUl3jnievo38Iy0A7RqHp98AWkTUQDFEDRRD1EAxRA0UQ9RAMUQNFPN/YKzQne1wvoUAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[[[-1.,  1., -1.,  1., -1.,  1., -1.],\n",
      "          [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],\n",
      "          [-1.,  1., -1.,  1., -1.,  1., -1.],\n",
      "          [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],\n",
      "          [-1.,  1., -1.,  1., -1.,  1., -1.],\n",
      "          [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],\n",
      "          [-1.,  1., -1.,  1., -1.,  1., -1.],\n",
      "          [ 0.,  0.,  0.,  0.,  0.,  0.,  0.]]]])\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "import torch.nn.functional as F\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# 创建 8x8 张量\n",
    "image = torch.zeros(8, 8)\n",
    "image[::2, ::2] = 1\n",
    "# image[1::2, 1::2] = 0\n",
    "\n",
    "# 将 image 张量转换为 numpy 数组\n",
    "image_np = image.numpy()\n",
    "\n",
    "# 使用 matplotlib 的 imshow 函数将图像显示出来\n",
    "plt.imshow(image_np, cmap='gray')\n",
    "plt.show()\n",
    "\n",
    "# 创建卷积核\n",
    "kernel = torch.tensor([[-1.0, 1.0]])\n",
    "\n",
    "# 使用卷积核对 image 进行卷积\n",
    "output = F.conv2d(image.unsqueeze(0).unsqueeze(0), kernel.unsqueeze(0).unsqueeze(0))\n",
    "\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "输出结果是个8x7张量，其中每个位置的值是1或-1的地方显示了检测到的边缘，可以看到全黑色的行中没有边缘，因此所有元素都为0。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.2.5 卷积核可以学习获得吗？\n",
    "\n",
    "在前面的例子当中，卷积核（也称内核）都是手工指定的。你有没有想过，它可以通过数据训练得到吗？答案是可以的。我们来看一个例子。首先，需要定义训练数据和标签。例如，假设你有一个 5x5 的图像和一个 3x3 的标签，则可以使用如下的代码将它们转换为 PyTorch 张量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[[[-0.1686,  0.0775,  0.1561],\n",
      "          [-0.0111,  0.1880,  0.2825],\n",
      "          [-0.1479,  0.1595,  0.0020]]]])\n",
      "tensor([0.0165])\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "\n",
    "# 创建训练数据和标签\n",
    "data = torch.zeros(1, 1, 5, 5)\n",
    "label = torch.zeros(1, 1, 3, 3)\n",
    "\n",
    "# 将训练数据和标签转换为 PyTorch 张量\n",
    "data = data.float()\n",
    "label = label.float()\n",
    "\n",
    "import torch.nn as nn\n",
    "\n",
    "# 创建卷积层，输入通道数为 1，输出通道数为 1\n",
    "conv = nn.Conv2d(1, 1, kernel_size=3, stride=1, padding=0)\n",
    "\n",
    "# 定义损失函数和优化器\n",
    "loss_fn = nn.MSELoss()\n",
    "optimizer = torch.optim.SGD(conv.parameters(), lr=0.001)\n",
    "\n",
    "# 训练模型\n",
    "for i in range(1000):\n",
    "    # 计算预测值\n",
    "    output = conv(data)\n",
    "\n",
    "    # 计算损失\n",
    "    loss = loss_fn(output, label)\n",
    "\n",
    "    # 清空梯度\n",
    "    optimizer.zero_grad()\n",
    "\n",
    "    # 反向传播\n",
    "    loss.backward()\n",
    "\n",
    "    # 优化模型参数\n",
    "    optimizer.step()\n",
    "\n",
    "# 访问卷积层的卷积核\n",
    "conv_kernel = conv.weight.data\n",
    "print(conv_kernel)\n",
    "\n",
    "# 访问卷积层的偏置项\n",
    "conv_bias = conv.bias.data\n",
    "print(conv_bias)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Next 7-3 常用操作](./7-3%20常用操作.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
